Skip to content

Conversation

@gorskysd
Copy link
Contributor

@gorskysd gorskysd commented Sep 8, 2022

Added bypass to StageSubmit events that have no "Submission Time" key.

Increased the (uncompressed) file size limit to 20GB (test log was 11GB Uncompressed).

@gorskysd gorskysd requested review from NKSync and rmoneys September 8, 2022 19:47
with tempfile.TemporaryDirectory() as temp_dir:
with self.assertRaises(ValueError, msg="Expected DBC event not found"):
EventLogBuilder(event_log.as_uri(), temp_dir).build()
self.check_value_error(event_log, "No rollover properties found in log file")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea moving this to a function to promote reusability


with open(event_log) as log_fobj:
event = json.loads(log_fobj.readline())
assert event["Event"] == "DBCEventLoggingListenerMetadata", "Expected first event is present"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the assertion error message correct? "Expected first event is present"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, accidentally deleted that when I was having troubles at first (wrong indentation on the assertion). It's back in there now. Neal also suggested turning all the error messages to negatives ("event missing" vs ("event presenet"), which I did throughout.

assert all(
key in event
for key in ["Event", "Spark Version", "Timestamp", "Rollover Number", "SparkContext Id"]
), "All keys are present"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, the message states "All keys are present" on an assertion error. Shouldn't it be something like "A key was missing"?

Also, is it important to check for SparkContext Id?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to check for SparkContext Id yet, might be something in the future though. Good catch on the error message.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good call out Neal. In the Java shop I worked at we used the message to describe the assertion like,

assertTrue(5 > 4, "5 is greater the 4");
assertNull("car is null", car);

(looks like the semantics were left to the organization though: https://softwareengineering.stackexchange.com/questions/301652/purpose-of-assertion-messages)

I don't remember the output when they failed, but in Python it looks like something that describes the error is better, e.g.

>>> assert "No rollover properties found in log file" == "Rollover file appears to be missing", "Exception message doesn't match"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError: Exception message doesn't match

And the documentation appears to support that: https://docs.python.org/3/reference/simple_stmts.html#grammar-token-python-grammar-assert_stmt

Copy link
Contributor

@rmoneys rmoneys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

self.source_url, self.work_dir, self.s3_client, extract_thresholds
)

def _validate_event_log_paths(self, event_log_paths: Path | str) -> Path:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like list is missing in the type annotations

event_log_paths = extractor.Extractor(event_log_path.as_uri(), temp_dir).extract()
eventlog.EventLogBuilder(event_log_paths, temp_dir).build()

assert str(cm.exception) == msg # , "Exception message matches"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious - why drop the message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops that was an oversight. Neal also suggested switching the messages to a negative phrasing ("eventlog missing" vs "eventlog present") since its more accurate. I did that throughout.

@sonarqubecloud
Copy link

sonarqubecloud bot commented Sep 9, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 3 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@gorskysd gorskysd merged commit a974926 into main Sep 9, 2022
@gorskysd gorskysd deleted the bugfix/PROD-426-spark-log-parser-error-key-error-submission-time branch September 9, 2022 23:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants